|

1.

Evolving Improved Sampling Protocols for Dose-Response Modelling Using Genetic Algorithms with a Profile-Likelihood Metric.

Lam, Nicholas N; Murray, Rua; Docherty, Paul D.

Bull Math Biol ; 86(6): 70, 2024 May 08.

Article En | MEDLINE | ID: mdl-38717656

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.

Algorithms , Computer Simulation , Mathematical Concepts , Models, Biological , Monte Carlo Method , Likelihood Functions , Humans , Dose-Response Relationship, Drug , Research Design/statistics & numerical data , Models, Genetic , Uncertainty

2.

An evaluation of computational methods for aggregate data meta-analyses of diagnostic test accuracy studies.

Zhao, Yixin; Khan, Bilal; Negeri, Zelalem F.

BMC Med Res Methodol ; 24(1): 111, 2024 May 10.

Article En | MEDLINE | ID: mdl-38730436

BACKGROUND: A Generalized Linear Mixed Model (GLMM) is recommended to meta-analyze diagnostic test accuracy studies (DTAs) based on aggregate or individual participant data. Since a GLMM does not have a closed-form likelihood function or parameter solutions, computational methods are conventionally used to approximate the likelihoods and obtain parameter estimates. The most commonly used computational methods are the Iteratively Reweighted Least Squares (IRLS), the Laplace approximation (LA), and the Adaptive Gauss-Hermite quadrature (AGHQ). Despite being widely used, it has not been clear how these computational methods compare and perform in the context of an aggregate data meta-analysis (ADMA) of DTAs. METHODS: We compared and evaluated the performance of three commonly used computational methods for GLMM - the IRLS, the LA, and the AGHQ, via a comprehensive simulation study and real-life data examples, in the context of an ADMA of DTAs. By varying several parameters in our simulations, we assessed the performance of the three methods in terms of bias, root mean squared error, confidence interval (CI) width, coverage of the 95% CI, convergence rate, and computational speed. RESULTS: For most of the scenarios, especially when the meta-analytic data were not sparse (i.e., there were no or negligible studies with perfect diagnosis), the three computational methods were comparable for the estimation of sensitivity and specificity. However, the LA had the largest bias and root mean squared error for pooled sensitivity and specificity when the meta-analytic data were sparse. Moreover, the AGHQ took a longer computational time to converge relative to the other two methods, although it had the best convergence rate. CONCLUSIONS: We recommend practitioners and researchers carefully choose an appropriate computational algorithm when fitting a GLMM to an ADMA of DTAs. We do not recommend the LA for sparse meta-analytic data sets. However, either the AGHQ or the IRLS can be used regardless of the characteristics of the meta-analytic data.

Computer Simulation , Diagnostic Tests, Routine , Meta-Analysis as Topic , Humans , Diagnostic Tests, Routine/methods , Diagnostic Tests, Routine/standards , Diagnostic Tests, Routine/statistics & numerical data , Linear Models , Algorithms , Likelihood Functions , Sensitivity and Specificity

3.

Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation-maximization maximum likelihood and increase of relationships.

Legarra, Andres; Bermann, Matias; Mei, Quanshun; Christensen, Ole F.

Genet Sel Evol ; 56(1): 35, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698347

BACKGROUND: The theory of "metafounders" proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders Γ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). METHODS: We derive likelihood methods to estimate Γ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of Γ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to Γ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of Γ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of Γ using estimates of the rate of increase of inbreeding ( Δ F ), resulting in an expanded Γ and in a pseudo-EM+ Δ F algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. RESULTS: For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ Δ F ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ Δ F ) approach yielded more accurate and unbiased estimates. CONCLUSIONS: We derived ML, pseudo-EM and pseudo-EM+ Δ F methods to estimate Γ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.

Breeding , Models, Genetic , Pedigree , Animals , Likelihood Functions , Breeding/methods , Algorithms , Sheep/genetics , Genomics/methods , Computer Simulation , Male , Female , Genotype

4.

Uncertainty Computation at Finite Distance in Nonlinear Mixed Effects Models-a New Method Based on Metropolis-Hastings Algorithm.

Guhl, Mélanie; Bertrand, Julie; Fayette, Lucie; Mercier, François; Comets, Emmanuelle.

AAPS J ; 26(3): 53, 2024 Apr 23.

Article En | MEDLINE | ID: mdl-38722435

The standard errors (SE) of the maximum likelihood estimates (MLE) of the population parameter vector in nonlinear mixed effect models (NLMEM) are usually estimated using the inverse of the Fisher information matrix (FIM). However, at a finite distance, i.e. far from the asymptotic, the FIM can underestimate the SE of NLMEM parameters. Alternatively, the standard deviation of the posterior distribution, obtained in Stan via the Hamiltonian Monte Carlo algorithm, has been shown to be a proxy for the SE, since, under some regularity conditions on the prior, the limiting distributions of the MLE and of the maximum a posterior estimator in a Bayesian framework are equivalent. In this work, we develop a similar method using the Metropolis-Hastings (MH) algorithm in parallel to the stochastic approximation expectation maximisation (SAEM) algorithm, implemented in the saemix R package. We assess this method on different simulation scenarios and data from a real case study, comparing it to other SE computation methods. The simulation study shows that our method improves the results obtained with frequentist methods at finite distance. However, it performed poorly in a scenario with the high variability and correlations observed in the real case study, stressing the need for calibration.

Algorithms , Computer Simulation , Monte Carlo Method , Nonlinear Dynamics , Uncertainty , Likelihood Functions , Bayes Theorem , Humans , Models, Statistical

5.

[Semiparametric analysis of nonparametric proportional hazards models with mixed dependent censored data].

Wang, S; Jiang, X; Zhao, B; Dong, H.

Nan Fang Yi Ke Da Xue Xue Bao ; 44(4): 689-696, 2024 Apr 20.

Article Zh | MEDLINE | ID: mdl-38708502

OBJECTIVE: To construct a nonparametric proportional hazards (PH) model for mixed informative interval-censored failure time data for predicting the risks in heart transplantation surgeries. METHODS: Based on the complexity of mixed informative interval-censored failure time data, we considered the interdependent relationship between failure time process and observation time process, constructed a nonparametric proportional hazards (PH) model to describe the nonlinear relationship between the risk factors and heart transplant surgery risks and proposed a two-step sieve estimation maximum likelihood algorithm. An estimation equation was established to estimate frailty variables using the observation process model. â -spline and B-spline were used to approximate the unknown baseline hazard function and nonparametric function, respectively, to obtain the working likelihood function in the sieve space. The partial derivative of the model parameters was used to obtain the scoring equation. The maximum likelihood estimation of the parameters was obtained by solving the scoring equation, and a function curve of the impact of risk factors on the risk of heart transplantation surgery was drawn. RESULTS: Simulation experiment suggested that the estimated values obtained by the proposed method were consistent and asymptotically effective under various settings with good fitting effects. Analysis of heart transplant surgery data showed that the donor's age had a positive linear relationship with the surgical risk. The impact of the recipient's age at disease onset increased at first and then stabilized, but increased against at an older age. The donor-recipient age difference had a positive linear relationship with the surgical risk of heart transplantation. CONCLUSION: The nonparametric PH model established in this study can be used for predicting the risks in heart transplantation surgery and exploring the functional relationship between the surgery risks and the risk factors.

Heart Transplantation , Proportional Hazards Models , Humans , Risk Factors , Algorithms , Likelihood Functions

6.

MOLECULAR PHYLOGENY OF THE LEECH GENUS PONTOBDELLA (HIRUDINIDA: PISCICOLIDAE) WITH NOTES ON PONTOBDELLA CALIFORNIANA AND PONTOBDELLA MACROTHELA.

Ruiz-Escobar, Fernando; Torres-Carrera, Gerardo; Islas-Villanueva, Valentina; Oceguera-Figueroa, Alejandro.

J Parasitol ; 110(3): 186-194, 2024 May 01.

Article En | MEDLINE | ID: mdl-38700436

Leech specimens of the genus Pontobdella (Hirudinida: Piscicolidae) were found off the coast of the state of Oaxaca (Pacific) as well as in Veracruz and Tabasco (Gulf of Mexico), Mexico. Based on the specimens collected in Oaxaca, a redescription of Pontobdella californiana is provided, with emphasis on the differences in the reproductive organs with the original description of the species. In addition, leech cocoons assigned to P. californiana were found attached to items hauled by gillnets and studied using scanning electron microscopy and molecular approaches. Samples of Pontobdella macrothela were found in both Pacific and Atlantic oceans, representing new geographic records. The phylogenetic position of P. californiana is investigated for the first time, and with the addition of Mexican samples of both species, the phylogenetic relationships within Pontobdella are reinvestigated. Parsimony and maximum-likelihood phylogenetic analysis were based on mitochondrial (cytochrome oxidase subunit I [COI] and 12S rRNA) and nuclear (18S rRNA and 28S rRNA) DNA sequences. Based on our results, we confirm the monophyly of Pontobdella and the pantropical distribution of P. macrothela with a new record in the Tropical Eastern Pacific.

Leeches , Microscopy, Electron, Scanning , Phylogeny , Animals , Leeches/classification , Leeches/genetics , Leeches/anatomy & histology , Mexico , Microscopy, Electron, Scanning/veterinary , Pacific Ocean , Atlantic Ocean , DNA, Ribosomal/chemistry , RNA, Ribosomal, 28S/genetics , Fish Diseases/parasitology , Gulf of Mexico/epidemiology , Electron Transport Complex IV/genetics , Ectoparasitic Infestations/parasitology , Ectoparasitic Infestations/veterinary , RNA, Ribosomal, 18S/genetics , Molecular Sequence Data , Sequence Alignment/veterinary , Likelihood Functions , Fishes/parasitology

7.

MPH: fast REML for large-scale genome partitioning of quantitative genetic variation.

Jiang, Jicai.

Bioinformatics ; 40(5)2024 May 02.

Article En | MEDLINE | ID: mdl-38688661

MOTIVATION: Genome partitioning of quantitative genetic variation is useful for dissecting the genetic architecture of complex traits. However, existing methods, such as Haseman-Elston regression and linkage disequilibrium score regression, often face limitations when handling extensive farm animal datasets, as demonstrated in this study. RESULTS: To overcome this challenge, we present MPH, a novel software tool designed for efficient genome partitioning analyses using restricted maximum likelihood. The computational efficiency of MPH primarily stems from two key factors: the utilization of stochastic trace estimators and the comprehensive implementation of parallel computation. Evaluations with simulated and real datasets demonstrate that MPH achieves comparable accuracy and significantly enhances convergence, speed, and memory efficiency compared to widely used tools like GCTA and LDAK. These advancements facilitate large-scale, comprehensive analyses of complex genetic architectures in farm animals. AVAILABILITY AND IMPLEMENTATION: The MPH software is available at https://jiang18.github.io/mph/.

Genetic Variation , Software , Animals , Genome , Quantitative Trait Loci , Likelihood Functions , Linkage Disequilibrium , Genomics/methods

8.

Optimizing scan time and bayesian penalized likelihood reconstruction algorithm in copper-64 PET/CT imaging: a phantom study.

Monsef, Abbas; Sheikhzadeh, Peyman; Steiner, Joseph R; Sadeghi, Fatemeh; Yazdani, Mohammadreza; Ghafarian, Pardis.

Biomed Phys Eng Express ; 10(4)2024 May 14.

Article En | MEDLINE | ID: mdl-38608316

Objectives: The aim of this study was to evaluate Cu-64 PET phantom image quality using Bayesian Penalized Likelihood (BPL) and Ordered Subset Expectation Maximum with point-spread function modeling (OSEM-PSF) reconstruction algorithms. In the BPL, the regularization parameterßwas varied to identify the optimum value for image quality. In the OSEM-PSF, the effect of acquisition time was evaluated to assess the feasibility of shortened scan duration.Methods: A NEMA IEC PET body phantom was filled with known activities of water soluble Cu-64. The phantom was imaged on a PET/CT scanner and was reconstructed using BPL and OSEM-PSF algorithms. For the BPL reconstruction, variousßvalues (150, 250, 350, 450, and 550) were evaluated. For the OSEM-PSF algorithm, reconstructions were performed using list-mode data intervals ranging from 7.5 to 240 s. Image quality was assessed by evaluating the signal to noise ratio (SNR), contrast to noise ratio (CNR), and background variability (BV).Results: The SNR and CNR were higher in images reconstructed with BPL compared to OSEM-PSF. Both the SNR and CNR increased with increasingß, peaking atß= 550. The CNR for allß, sphere sizes and tumor-to-background ratios (TBRs) satisfied the Rose criterion for image detectability (CNR > 5). BPL reconstructed images withß= 550 demonstrated the highest improvement in image quality. For OSEM-PSF reconstructed images with list-mode data duration ≥ 120 s, the noise level and CNR were not significantly different from the baseline 240 s list-mode data duration.Conclusions: BPL reconstruction improved Cu-64 PET phantom image quality by increasing SNR and CNR relative to OSEM-PSF reconstruction. Additionally, this study demonstrated scan time can be reduced from 240 to 120 s when using OSEM-PSF reconstruction while maintaining similar image quality. This study provides baseline data that may guide future studies aimed to improve clinical Cu-64 imaging.

Algorithms , Bayes Theorem , Copper Radioisotopes , Image Processing, Computer-Assisted , Phantoms, Imaging , Positron Emission Tomography Computed Tomography , Signal-To-Noise Ratio , Positron Emission Tomography Computed Tomography/methods , Image Processing, Computer-Assisted/methods , Likelihood Functions , Humans

9.

Shape restricted additive hazards models: Monotone, unimodal, and U-shape hazard functions.

Chung, Yunro; Ivanova, Anastasia; Fine, Jason P.

Stat Med ; 43(9): 1671-1687, 2024 Apr 30.

Article En | MEDLINE | ID: mdl-38634251

We consider estimation of the semiparametric additive hazards model with an unspecified baseline hazard function where the effect of a continuous covariate has a specific shape but otherwise unspecified. Such estimation is particularly useful for a unimodal hazard function, where the hazard is monotone increasing and monotone decreasing with an unknown mode. A popular approach of the proportional hazards model is limited in such setting due to the complicated structure of the partial likelihood. Our model defines a quadratic loss function, and its simple structure allows a global Hessian matrix that does not involve parameters. Thus, once the global Hessian matrix is computed, a standard quadratic programming method can be applicable by profiling all possible locations of the mode. However, the quadratic programming method may be inefficient to handle a large global Hessian matrix in the profiling algorithm due to a large dimensionality, where the dimension of the global Hessian matrix and number of hypothetical modes are the same order as the sample size. We propose the quadratic pool adjacent violators algorithm to reduce computational costs. The proposed algorithm is extended to the model with a time-dependent covariate with monotone or U-shape hazard function. In simulation studies, our proposed method improves computational speed compared to the quadratic programming method, with bias and mean square error reductions. We analyze data from a recent cardiovascular study.

Algorithms , Humans , Proportional Hazards Models , Computer Simulation , Probability , Bias , Likelihood Functions

10.

Leveraging new methods for comprehensive characterization of mitochondrial DNA in esophageal squamous cell carcinoma.

Zhuang, Xuehan; Ye, Rui; Zhou, Yong; Cheng, Matthew Yibo; Cui, Heyang; Wang, Longlong; Zhang, Shuangping; Wang, Shubin; Cui, Yongping; Zhang, Weimin.

Genome Med ; 16(1): 50, 2024 Apr 02.

Article En | MEDLINE | ID: mdl-38566210

BACKGROUND: Mitochondria play essential roles in tumorigenesis; however, little is known about the contribution of mitochondrial DNA (mtDNA) to esophageal squamous cell carcinoma (ESCC). Whole-genome sequencing (WGS) is by far the most efficient technology to fully characterize the molecular features of mtDNA; however, due to the high redundancy and heterogeneity of mtDNA in regular WGS data, methods for mtDNA analysis are far from satisfactory. METHODS: Here, we developed a likelihood-based method dMTLV to identify low-heteroplasmic mtDNA variants. In addition, we described fNUMT, which can simultaneously detect non-reference nuclear sequences of mitochondrial origin (non-ref NUMTs) and their derived artifacts. Using these new methods, we explored the contribution of mtDNA to ESCC utilizing the multi-omics data of 663 paired tumor-normal samples. RESULTS: dMTLV outperformed the existing methods in sensitivity without sacrificing specificity. The verification using Nanopore long-read sequencing data showed that fNUMT has superior specificity and more accurate breakpoint identification than the current methods. Leveraging the new method, we identified a significant association between the ESCC overall survival and the ratio of mtDNA copy number of paired tumor-normal samples, which could be potentially explained by the differential expression of genes enriched in pathways related to metabolism, DNA damage repair, and cell cycle checkpoint. Additionally, we observed that the expression of CBWD1 was downregulated by the non-ref NUMTs inserted into its intron region, which might provide precursor conditions for the tumor cells to adapt to a hypoxic environment. Moreover, we identified a strong positive relationship between the number of mtDNA truncating mutations and the contribution of signatures linked to tumorigenesis and treatment response. CONCLUSIONS: Our new frameworks promote the characterization of mtDNA features, which enables the elucidation of the landscapes and roles of mtDNA in ESCC essential for extending the current understanding of ESCC etiology. dMTLV and fNUMT are freely available from https://github.com/sunnyzxh/dMTLV and https://github.com/sunnyzxh/fNUMT , respectively.

Esophageal Neoplasms , Esophageal Squamous Cell Carcinoma , Humans , Esophageal Squamous Cell Carcinoma/genetics , DNA, Mitochondrial/genetics , DNA, Mitochondrial/analysis , DNA, Mitochondrial/metabolism , Esophageal Neoplasms/genetics , Esophageal Neoplasms/metabolism , Esophageal Neoplasms/pathology , Likelihood Functions , Mitochondria/genetics , Carcinogenesis

11.

Addressing age measurement errors in fish growth estimation from length-stratified samples.

Zheng, Nan; Kheirollahi, Atefeh; Yilmaz, Yildiz.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38647000

Fish growth models are crucial for fisheries stock assessments and are commonly estimated using fish length-at-age data. This data is widely collected using length-stratified age sampling (LSAS), a cost-effective two-phase response-selective sampling method. The data may contain age measurement errors (MEs). We propose a methodology that accounts for both LSAS and age MEs to accurately estimate fish growth. The proposed methods use empirical proportion likelihood methodology for LSAS and the structural errors in variables methodology for age MEs. We provide a measure of uncertainty for parameter estimates and standardized residuals for model validation. To model the age distribution, we employ a continuation ratio-logit model that is consistent with the random nature of the true age distribution. We also apply a discretization approach for age and length distributions, which significantly improves computational efficiency and is consistent with the discrete age and length data typically encountered in practice. Our simulation study shows that neglecting age MEs can lead to significant bias in growth estimation, even with small but non-negligible age MEs. However, our new approach performs well regardless of the magnitude of age MEs and accurately estimates SEs of parameter estimators. Real data analysis demonstrates the effectiveness of the proposed model validation device. Computer codes to implement the methodology are provided.

Computer Simulation , Fishes , Animals , Fishes/growth & development , Models, Statistical , Fisheries/statistics & numerical data , Biometry/methods , Likelihood Functions , Bias

12.

Deep partially linear cox model for current status data.

Wu, Qiang; Tong, Xingwei; Zhao, Xingqiu.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38563532

Deep learning has continuously attained huge success in diverse fields, while its application to survival data analysis remains limited and deserves further exploration. For the analysis of current status data, a deep partially linear Cox model is proposed to circumvent the curse of dimensionality. Modeling flexibility is attained by using deep neural networks (DNNs) to accommodate nonlinear covariate effects and monotone splines to approximate the baseline cumulative hazard function. We establish the convergence rate of the proposed maximum likelihood estimators. Moreover, we derive that the finite-dimensional estimator for treatment covariate effects is $\sqrt{n}$-consistent, asymptotically normal, and attains semiparametric efficiency. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to real-world data on news popularity.

Proportional Hazards Models , Likelihood Functions , Survival Analysis , Computer Simulation , Linear Models

13.

Bootstrap tests for simultaneous monotone ordering of effects in a two-way ANOVA.

Dey, Raju; Mondal, Anjana; Kumar, Somesh.

Biom J ; 66(3): e2300238, 2024 Apr.

Article En | MEDLINE | ID: mdl-38581103

In a two-way additive analysis of variance (ANOVA) model, we consider the problem of testing for homogeneity of both row and column effects against their simultaneous ordering. The error variances are assumed to be heterogeneous with unbalanced samples in each cell. Two simultaneous test procedures are developed-the first one using the likelihood ratio test (LRT) statistics of two independent hypotheses and another based on the consecutive pairwise differences of estimators of effects. The parametric bootstrap (PB) approach is used to find critical points of both the tests and the asymptotic accuracy of the bootstrap is established. An extensive simulation study shows that the proposed tests achieve the nominal size and have very good power performance. The robustness of the tests is also analyzed under deviation from normality. An "R" package is developed and shared on "GitHub" for ease of implementation of users. The proposed tests are illustrated using a real data set on the mortality due to alcoholic liver disease and it is shown that age and gender have a significant impact on the increasing incidence of mortality.

Models, Statistical , Analysis of Variance , Computer Simulation , Likelihood Functions

14.

Recombination analysis on the receptor switching event of MERS-CoV and its close relatives: implications for the emergence of MERS-CoV.

Tolentino, Jarel Elgin; Lytras, Spyros; Ito, Jumpei; Sato, Kei.

Virol J ; 21(1): 84, 2024 04 10.

Article En | MEDLINE | ID: mdl-38600521

BACKGROUND: PlMERS-CoV is a coronavirus known to cause severe disease in humans, taxonomically classified under the subgenus Merbecovirus. Recent findings showed that the close relatives of MERS-CoV infecting vespertillionid bats (family Vespertillionidae), named NeoCoV and PDF-2180, use their hosts' ACE2 as their entry receptor, unlike the DPP4 receptor usage of MERS-CoV. Previous research suggests that this difference in receptor usage between these related viruses is a result of recombination. However, the precise location of the recombination breakpoints and the details of the recombination event leading to the change of receptor usage remain unclear. METHODS: We used maximum likelihood-based phylogenetics and genetic similarity comparisons to characterise the evolutionary history of all complete Merbecovirus genome sequences. Recombination events were detected by multiple computational methods implemented in the recombination detection program. To verify the influence of recombination, we inferred the phylogenetic relation of the merbecovirus genomes excluding recombinant segments and that of the viruses' receptor binding domains and examined the level of congruency between the phylogenies. Finally, the geographic distribution of the genomes was inspected to identify the possible location where the recombination event occurred. RESULTS: Similarity plot analysis and the recombination-partitioned phylogenetic inference showed that MERS-CoV is highly similar to NeoCoV (and PDF-2180) across its whole genome except for the spike-encoding region. This is confirmed to be due to recombination by confidently detecting a recombination event between the proximal ancestor of MERS-CoV and a currently unsampled merbecovirus clade. Notably, the upstream recombination breakpoint was detected in the N-terminal domain and the downstream breakpoint at the S2 subunit of spike, indicating that the acquired recombined fragment includes the receptor-binding domain. A tanglegram comparison further confirmed that the receptor binding domain-encoding region of MERS-CoV was acquired via recombination. Geographic mapping analysis on sampling sites suggests the possibility that the recombination event occurred in Africa. CONCLUSION: Together, our results suggest that recombination can lead to receptor switching of merbecoviruses during circulation in bats. These results are useful for future epidemiological assessments and surveillance to understand the spillover risk of bat coronaviruses to the human population.

Chiroptera , Coronavirus Infections , Middle East Respiratory Syndrome Coronavirus , Animals , Humans , Middle East Respiratory Syndrome Coronavirus/genetics , Phylogeny , Likelihood Functions , Coronavirus Infections/veterinary , Coronavirus Infections/epidemiology , Recombination, Genetic , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/metabolism

15.

Informing policy via dynamic models: Cholera in Haiti.

Wheeler, Jesse; Rosengart, AnnaElaine; Jiang, Zhuoxun; Tan, Kevin; Treutle, Noah; Ionides, Edward L.

PLoS Comput Biol ; 20(4): e1012032, 2024 Apr.

Article En | MEDLINE | ID: mdl-38683863

Public health decisions must be made about when and how to implement interventions to control an infectious disease epidemic. These decisions should be informed by data on the epidemic as well as current understanding about the transmission dynamics. Such decisions can be posed as statistical questions about scientifically motivated dynamic models. Thus, we encounter the methodological task of building credible, data-informed decisions based on stochastic, partially observed, nonlinear dynamic models. This necessitates addressing the tradeoff between biological fidelity and model simplicity, and the reality of misspecification for models at all levels of complexity. We assess current methodological approaches to these issues via a case study of the 2010-2019 cholera epidemic in Haiti. We consider three dynamic models developed by expert teams to advise on vaccination policies. We evaluate previous methods used for fitting these models, and we demonstrate modified data analysis strategies leading to improved statistical fit. Specifically, we present approaches for diagnosing model misspecification and the consequent development of improved models. Additionally, we demonstrate the utility of recent advances in likelihood maximization for high-dimensional nonlinear dynamic models, enabling likelihood-based inference for spatiotemporal incidence data using this class of models. Our workflow is reproducible and extendable, facilitating future investigations of this disease system.

Cholera , Haiti/epidemiology , Cholera/epidemiology , Cholera/transmission , Cholera/prevention & control , Humans , Computational Biology/methods , Epidemics/statistics & numerical data , Epidemics/prevention & control , Epidemiological Models , Health Policy , Likelihood Functions , Stochastic Processes , Models, Statistical

16.

Interval estimation in three-class receiver operating characteristic analysis: A fairly general approach based on the empirical likelihood.

To, Duc-Khanh; Adimari, Gianfranco; Chiogna, Monica.

Stat Methods Med Res ; 33(5): 875-893, 2024 May.

Article En | MEDLINE | ID: mdl-38502023

The empirical likelihood is a powerful nonparametric tool, that emulates its parametric counterpart-the parametric likelihood-preserving many of its large-sample properties. This article tackles the problem of assessing the discriminatory power of three-class diagnostic tests from an empirical likelihood perspective. In particular, we concentrate on interval estimation in a three-class receiver operating characteristic analysis, where a variety of inferential tasks could be of interest. We present novel theoretical results and tailored techniques studied to efficiently solve some of such tasks. Extensive simulation experiments are provided in a supporting role, with our novel proposals compared to existing competitors, when possible. It emerges that our new proposals are extremely flexible, being able to compete with contestants and appearing suited to accommodating several distributions, such, for example, mixtures, for target populations. We illustrate the application of the novel proposals with a real data example. The article ends with a discussion and a presentation of some directions for future research.

ROC Curve , Likelihood Functions , Humans , Diagnostic Tests, Routine/statistics & numerical data , Models, Statistical , Computer Simulation

17.

On variable selection in a semiparametric AFT mixture cure model.

Parsa, Motahareh; Taghavi-Shahri, Seyed Mahmood; Van Keilegom, Ingrid.

Lifetime Data Anal ; 30(2): 472-500, 2024 Apr.

Article En | MEDLINE | ID: mdl-38436831

In clinical studies, one often encounters time-to-event data that are subject to right censoring and for which a fraction of the patients under study never experience the event of interest. Such data can be modeled using cure models in survival analysis. In the presence of cure fraction, the mixture cure model is popular, since it allows to model probability to be cured (called the incidence) and the survival function of the uncured individuals (called the latency). In this paper, we develop a variable selection procedure for the incidence and latency parts of a mixture cure model, consisting of a logistic model for the incidence and a semiparametric accelerated failure time model for the latency. We use a penalized likelihood approach, based on adaptive LASSO penalties for each part of the model, and we consider two algorithms for optimizing the criterion function. Extensive simulations are carried out to assess the accuracy of the proposed selection procedure. Finally, we employ the proposed method to a real dataset regarding heart failure patients with left ventricular systolic dysfunction.

Algorithms , Models, Statistical , Humans , Likelihood Functions , Survival Analysis , Logistic Models , Computer Simulation

18.

Estimating the size of a closed population by modeling latent and observed heterogeneity.

Bartolucci, Francesco; Forcina, Antonio.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38536746

The paper extends the empirical likelihood (EL) approach of Liu et al. to a new and very flexible family of latent class models for capture-recapture data also allowing for serial dependence on previous capture history, conditionally on latent type and covariates. The EL approach allows to estimate the overall population size directly rather than by adding estimates conditional to covariate configurations. A Fisher-scoring algorithm for maximum likelihood estimation is proposed and a more efficient alternative to the traditional EL approach for estimating the non-parametric component is introduced; this allows us to show that the mapping between the non-parametric distribution of the covariates and the probabilities of being never captured is one-to-one and strictly increasing. Asymptotic results are outlined, and a procedure for constructing profile likelihood confidence intervals for the population size is presented. Two examples based on real data are used to illustrate the proposed approach and a simulation study indicates that, when estimating the overall undercount, the method proposed here is substantially more efficient than the one based on conditional maximum likelihood estimation, especially when the sample size is not sufficiently large.

Models, Statistical , Likelihood Functions , Computer Simulation , Population Density , Sample Size

19.

Adjusting Incidence Estimates with Laboratory Test Performances: A Pragmatic Maximum Likelihood Estimation-Based Approach.

Weng, Yingjie; Tian, Lu; Boothroyd, Derek; Lee, Justin; Zhang, Kenny; Lu, Di; Lindan, Christina P; Bollyky, Jenna; Huang, Beatrice; Rutherford, George W; Maldonado, Yvonne; Desai, Manisha.

Epidemiology ; 35(3): 295-307, 2024 May 01.

Article En | MEDLINE | ID: mdl-38465940

Understanding the incidence of disease is often crucial for public policy decision-making, as observed during the COVID-19 pandemic. Estimating incidence is challenging, however, when the definition of incidence relies on tests that imperfectly measure disease, as in the case when assays with variable performance are used to detect the SARS-CoV-2 virus. To our knowledge, there are no pragmatic methods to address the bias introduced by the performance of labs in testing for the virus. In the setting of a longitudinal study, we developed a maximum likelihood estimation-based approach to estimate laboratory performance-adjusted incidence using the expectation-maximization algorithm. We constructed confidence intervals (CIs) using both bootstrapped-based and large-sample interval estimator approaches. We evaluated our methods through extensive simulation and applied them to a real-world study (TrackCOVID), where the primary goal was to determine the incidence of and risk factors for SARS-CoV-2 infection in the San Francisco Bay Area from July 2020 to March 2021. Our simulations demonstrated that our method converged rapidly with accurate estimates under a variety of scenarios. Bootstrapped-based CIs were comparable to the large-sample estimator CIs with a reasonable number of incident cases, shown via a simulation scenario based on the real TrackCOVID study. In more extreme simulated scenarios, the coverage of large-sample interval estimation outperformed the bootstrapped-based approach. Results from the application to the TrackCOVID study suggested that assuming perfect laboratory test performance can lead to an inaccurate inference of the incidence. Our flexible, pragmatic method can be extended to a variety of disease and study settings.

COVID-19 , Pandemics , Humans , Likelihood Functions , Incidence , Longitudinal Studies , Computer Simulation , COVID-19/epidemiology

20.

Investigating the potential of single-cell DNA methylation data to detect allele-specific methylation and imprinting.

Johnson, Nicholas D; Cutler, David J; Conneely, Karen N.

Am J Hum Genet ; 111(4): 654-667, 2024 Apr 04.

Article En | MEDLINE | ID: mdl-38471507

Allele-specific methylation (ASM) is an epigenetic modification whereby one parental allele becomes methylated and the other unmethylated at a specific locus. ASM is most often driven by the presence of nearby heterozygous variants that influence methylation, but also occurs somatically in the context of genomic imprinting. In this study, we investigate ASM using publicly available single-cell reduced representation bisulfite sequencing (scRRBS) data on 608 B cells sampled from six healthy B cell samples and 1,230 cells from 11 chronic lymphocytic leukemia (CLL) samples. We developed a likelihood-based criterion to test whether a CpG exhibited ASM, based on the distributions of methylated and unmethylated reads both within and across cells. Applying our likelihood ratio test, 65,998 CpG sites exhibited ASM in healthy B cell samples according to a Bonferroni criterion (p < 8.4 × 10-9), and 32,862 CpG sites exhibited ASM in CLL samples (p < 8.5 × 10-9). We also called ASM at the sample level. To evaluate the accuracy of our method, we called heterozygous variants from the scRRBS data, which enabled variant-based calls of ASM within each cell. Comparing sample-level ASM calls to the variant-based measures of ASM, we observed a positive predictive value of 76%-100% across samples. We observed high concordance of ASM across samples and an overrepresentation of ASM in previously reported imprinted genes and genes with imprinting binding motifs. Our study demonstrates that single-cell bisulfite sequencing is a potentially powerful tool to investigate ASM, especially as studies expand to increase the number of samples and cells sequenced.

DNA Methylation , Leukemia, Lymphocytic, Chronic, B-Cell , Sulfites , Humans , DNA Methylation/genetics , Alleles , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Likelihood Functions , Genomic Imprinting/genetics , CpG Islands/genetics